Search CORE

79 research outputs found

Exploration de la dynamique humaine basée sur des données massives de réseaux sociaux de géolocalisation : analyse et applications

Author: YANG Dingqi
Publication venue: HAL CCSD
Publication date: 27/01/2015
Field of study

Human dynamics is an essential aspect of human centric computing. As a transdisciplinary research field, it focuses on understanding the underlying patterns, relationships, and changes of human behavior. By exploring human dynamics, we can understand not only individual’s behavior, such as a presence at a specific place, but also collective behaviors, such as social movement. Understanding human dynamics can thus enable various applications, such as personalized location based services. However, before the availability of ubiquitous smart devices (e.g., smartphones), it is practically hard to collect large-scale human behavior data. With the ubiquity of GPS-equipped smart phones, location based social media has gained increasing popularity in recent years, making large-scale user activity data become attainable. Via location based social media, users can share their activities as real-time presences at Points of Interests (POIs), such as a restaurant or a bar, within their social circles. Such data brings an unprecedented opportunity to study human dynamics. In this dissertation, based on large-scale location centric social media data, we study human dynamics from both individual and collective perspectives. From individual perspective, we study user preference on POIs with different granularities and its applications in personalized location based services, as well as the spatial-temporal regularity of user activities. From collective perspective, we explore the global scale collective activity patterns with both country and city granularities, and also identify their correlations with diverse human culturesLa dynamique humaine est un sujet essentiel de l'informatique centrée sur l’homme. Elle se concentre sur la compréhension des régularités sous-jacentes, des relations, et des changements dans les comportements humains. En analysant la dynamique humaine, nous pouvons comprendre non seulement des comportements individuels, tels que la présence d’une personne à un endroit précis, mais aussi des comportements collectifs, comme les mouvements sociaux. L’exploration de la dynamique humaine permet ainsi diverses applications, entre autres celles des services géo-dépendants personnalisés dans des scénarios de ville intelligente. Avec l'omniprésence des smartphones équipés de GPS, les réseaux sociaux de géolocalisation ont acquis une popularité croissante au cours des dernières années, ce qui rend les données de comportements des utilisateurs disponibles à grande échelle. Sur les dits réseaux sociaux de géolocalisation, les utilisateurs peuvent partager leurs activités en temps réel avec par l'enregistrement de leur présence à des points d'intérêt (POIs), tels qu’un restaurant. Ces données d'activité contiennent des informations massives sur la dynamique humaine. Dans cette thèse, nous explorons la dynamique humaine basée sur les données massives des réseaux sociaux de géolocalisation. Concrètement, du point de vue individuel, nous étudions la préférence de l'utilisateur quant aux POIs avec des granularités différentes et ses applications, ainsi que la régularité spatio-temporelle des activités des utilisateurs. Du point de vue collectif, nous explorons la forme d'activité collective avec les granularités de pays et ville, ainsi qu’en corrélation avec les cultures globale

Thèses en Ligne

Theses.fr

POIsketch: Semantic Place Labeling over User Activity Streams

Author: Cudré-Mauroux Philippe
Li Bin
Yang Dingqi
Publication venue
Publication date: 09/12/2016
Field of study

RERO DOC Digital Library

PrivCheck: Privacy-Preserving Check-in Data Publishing for Personalized Location Based Services

Author: Cudré-Mauroux Philippe
Qu Bingqing
Yang Dingqi
Zhang Daqing
Publication venue
Publication date: 12/09/2016
Field of study

International audienceWith the widespread adoption of smartphones, we have observed an increasing popularity of Location-Based Services (LBSs) in the past decade. To improve user experience, LBSs often provide personalized recommendations to users by mining their activity (i.e., check-in) data from location-based social networks. However, releasing user check-in data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from the users'check-in data. In this paper, we propose PrivCheck, a customizable and continuous privacy-preserving check-in data publishing framework providing users with continuous privacy protection against inference attacks. The key idea of PrivCheck is to obfuscate user check-in data such that the privacy leakage of user-specified private data is minimized under a given data distortion budget, which ensures the utility of the obfuscated data to empower personalized LBSs. Since users often give LBS providers access to both their historical check-in data and future check-in streams, we develop two data obfuscation methods for historical and online check-in publishing, respectively. An empirical evaluation on two real-world datasets shows that our framework can efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated data for personalized LBS

RERO DOC Digital Library

HAL-Rennes 1

Histosketch: fast similarity-preserving sketching of streaming histograms with concept drift

Author: Cudré-Mauroux Philippe
Li Bin
Rettig Laura
Yang Dingqi
Publication venue
Publication date: 17/04/2018
Field of study

Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming data, where the elements of a histogram are observed in a streaming manner. First, the ever-growing cardinality of histogram elements makes any similarity computation inefficient. Second, the concept-drift issue in the data streams also impairs the accurate assessment of the similarity. In this paper, we propose to overcome the above challenges with HistoSketch, a fast similarity-preserving sketching method for streaming histograms with concept drift. Specifically, HistoSketch is designed to incrementally maintain a set of compact and fixed-size sketches of streaming histograms to approximate similarity between the histograms, with the special consideration of gradually forgetting the outdated histogram elements. We evaluate HistoSketch on multiple classification tasks using both synthetic and real-world datasets. The results show that our method is able to efficiently approximate similarity for streaming histograms and quickly adapt to concept drift. Compared to full streaming histograms gradually forgetting the outdated histogram elements, HistoSketch is able to dramatically reduce the classification time (with a 7500x speedup) with only a modest loss in accuracy (about 3.5%)

RERO DOC Digital Library

Geographic differential privacy for mobile crowd coverage maximization

Author: Han Xiao
Ma Xiaojuan
Qin Gehua
Wang Leye
Yang Dingqi
Publication venue
Publication date: 18/11/2017
Field of study

For real-world mobile applications such as location-based advertising and spatial crowdsourcing, a key to success is targeting mobile users that can maximally cover certain locations in a future period. To find an optimal group of users, existing methods often require information about users' mobility history, which may cause privacy breaches. In this paper, we propose a method to maximize mobile crowd's future location coverage under a guaranteed location privacy protection scheme. In our approach, users only need to upload one of their frequently visited locations, and more importantly, the uploaded location is obfuscated using a geographic differential privacy policy. We propose both analytic and practical solutions to this problem. Experiments on real user mobility datasets show that our method significantly outperforms the state-of-the-art geographic differential privacy methods by achieving a higher coverage under the same level of privacy protection

arXiv.org e-Print Archive

RERO DOC Digital Library

Association for the Advancement of Artificial Intelligence: AAAI Publications

Engineering a Simplified 0-Bit Consistent Weighted Sampling

Author: Chum O.
Raff Edward
Shrivastava Anshumali
Shrivastava Anshumali
Yang Dingqi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/10/2018
Field of study

The Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification. To apply it to real-valued datasets, the ICWS algorithm has become a seminal approach that is widely used, and provides state-of-the-art performance for this problem space. However, ICWS suffers a computational burden as the sketch size K increases. We develop a new Simplified approach to the ICWS algorithm, that enables us to obtain over 20x speedups compared to the standard algorithm. The veracity of our approach is demonstrated empirically on multiple datasets and scenarios, showing that our new Simplified CWS obtains the same quality of results while being an order of magnitude faster

arXiv.org e-Print Archive

Crossref

CrimeTelescope: crime hotspot prediction based on urban and social media data fusion

Author: Cudré-Mauroux Philippe
Heaney Terence
Tonon Alberto
Wang Leye
Yang Dingqi
Publication venue
Publication date: 10/04/2018
Field of study

Crime is a complex social issue impacting a considerable number of individuals within a society. Preventing and reducing crime is a top priority in many countries. Given limited policing and crime reduction resources, it is often crucial to identify effective strategies to deploy the available resources. Towards this goal, crime hotspot prediction has previously been suggested. Crime hotspot prediction leverages past data in order to identify geographical areas susceptible of hosting crimes in the future. However, most of the existing techniques in crime hotspot prediction solely use historical crime records to identify crime hotspots, while ignoring the predictive power of other data such as urban or social media data. In this paper, we propose CrimeTelescope, a platform that predicts and visualizes crime hotspots based on a fusion of different data types. Our platform continuously collects crime data as well as urban and social media data on the Web. It then extracts key features from the collected data based on both statistical and linguistic analysis. Finally, it identifies crime hotspots by leveraging the extracted features, and offers visualizations of the hotspots on an interactive map. Based on real-world data collected from New York City, we show that combining different types of data can effectively improve the crime hotspot prediction accuracy (by up to 5.2%), compared to classical approaches based on historical crime records only. In addition, we demonstrate the usability of our platform through a System Usability Scale (SUS) survey on a full prototype of CrimeTelescope

RERO DOC Digital Library

Location privacy-preserving task allocation for mobile crowdsensing with differential geo-obfuscation

Author: Han Xiao
Ma Xiaojuan
Wang Leye
Wang Tianben
Yang Dingqi
Zhang Daqing
Publication venue
Publication date: 10/04/2018
Field of study

In traditional mobile crowdsensing applications, organizers need participants' precise locations for optimal task allocation, e.g., minimizing selected workers' travel distance to task locations. However, the exposure of their locations raises privacy concerns. Especially for those who are not eventually selected for any task, their location privacy is sacrificed in vain. Hence, in this paper, we propose a location privacy-preserving task allocation framework with geo-obfuscation to protect users' locations during task assignments. Specifically, we make participants obfuscate their reported locations under the guarantee of differential privacy, which can provide privacy protection regardless of adversaries' prior knowledge and without the involvement of any third- part entity. In order to achieve optimal task allocation with such differential geo- obfuscation, we formulate a mixed-integer non-linear programming problem to minimize the expected travel distance of the selected workers under the constraint of differential privacy. Evaluation results on both simulation and real-world user mobility traces show the effectiveness of our proposed framework. Particularly, our framework outperforms Laplace obfuscation, a state-of-the-art differential geo-obfuscation mechanism, by achieving 45% less average travel distance on the real-world data

RERO DOC Digital Library

Privacy-preserving social media data publishing for personalized ranking-based recommendation

Author: Cudré-Mauroux Philippe
Qu Bingqing
Yang Dingqi
Publication venue
Publication date: 04/04/2019
Field of study

Personalized recommendation is crucial to help users find pertinent information. It often relies on a large collection of user data, in particular users' online activity (e.g., tagging/rating/checking-in) on social media, to mine user preference. However, releasing such user activity data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from the users' activity data. In this paper, we proposed PrivRank, a customizable and continuous privacy-preserving social media data publishing framework protecting users against inference attacks while enabling personalized ranking-based recommendations. Its key idea is to continuously obfuscate user activity data such that the privacy leakage of user- specified private data is minimized under a given data distortion budget, which bounds the ranking loss incurred from the data obfuscation process in order to preserve the utility of the data for enabling recommendations. An empirical evaluation on both synthetic and real-world datasets shows that our framework can efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated data for personalized ranking-based recommendation. Compared to state-of-the-art approaches, PrivRank achieves both a better privacy protection and a higher utility in all the ranking-based recommendation use cases we tested

RERO DOC Digital Library

Knowledge graph embeddings

Author: Cudré-Mauroux Philippe
Rosso Paolo
Yang Dingqi
Publication venue
Publication date: 04/04/2019
Field of study

With the growing popularity of multi-relational data on the Web, knowledge graphs (KGs) have become a key data source in various application domains, such as Web search, question answering, and natural language understanding. In a typical KG such as Freebase (Bollacker et al. 2008) or Google’s Knowledge Graph (Google 2014), entities are connected via relations. For example, Bern is capital of Switzerland. Formally, a popular approach to represent such relational data is to use the Resource Description Framework. It defines a fact as a triple (subject, predicate, and object), which is also known as head, relation, and tail or (h, r, t) for short. Following the above example, the head, relation, and tail..

RERO DOC Digital Library